Welcome Reviewer(s)!

This final project draft is still in very rough shape, but I hope what I have makes sense. Thanks in advance for your comments and your time!

Heritage Language Maintenance

What is a heritage language and why do we need to maintain it?

[some stuff will go here]

How is a heritage language learner different from a second language learner?

[some more stuff will go here]

Why does it matter?

[and here]

What is the purpose of this project?

The visualizations below are intended to convey information about enrollment trends in higher education institutions for non-English language courses in the United States. The data used was gathered from a Modern Language Association national survey implemented periodically between 1958 and 2016 which gathers information about each individual institution, including languages offered, enrollment numbers, institution type, geographic information, history of the institution’s name, and accreditation.

The reason I chose this particular dataset, which does not highlight information about heritage language programs is mainly because that data does not exist on this level. Existing research and existing heritage language programs are limited and suffer a significant challenge in obscurity. Higher education institutions that do opt to offer heritage language courses often house them within larger modern language departments as a series of courses taken before moving into advanced language courses with second-language learners, which can significantly limit potential enrollment and interest. How do you pursue a program you don’t even know exists?

This presents a few problems. Firstly, we don’t have a real idea of how many institutions currently offer any type of heritage language support system(s). Secondly, even when they are present, we don’t have an idea of how often they’re utilized, what kind of interest they generate among students, what the major goals and interests of current programs are, and what characterizes current heritage language pedagogical practices outside of limited case studies that are not generalizable on their own.

The hope is that this data offers a starting point. Though we can’t make any inferences about heritage learners specifically, what we can understand looking at these enrollment trends are which languages are already showing stable or growing trends enough so that adding additional support systems could be more easily justifiable and implementable. It shows us which geographic areas in the country show significantly higher language learning trends as a way to approach a given area for closer scrutiny. Taking into account the limitations of this data, any sweeping generalizations about the types of programs that should or should not be implemented is not a responsible claim to make. Rather, what this data shows us is a starting point from which to argue for further data collection that can help us understand a given institution’s language program needs as reflected by student utilization and motivation. I would argue one potential avenue for research this makes room for is characterizing the student data gathered for potential recruitment efforts. Another would be cross-institutional collaboration between language departments and other departments to create course offerings that supplement student learning in other areas through non-English languages.

[move this somewhere else] As it relates to the topic of Spanish HLPs, the data is optimistic as it shows that further research into Spanish language programs is not unfounded as SC accounts for such a large portion of LCs.

Visualization 1:

For this visualization I am highlighting enrollment trends for non-English language courses in higher education institutions. The visualization displays a combined total across all surveyed U.S. institutions, including public and private 2-year and 4-year institutions. The top 10 languages for each survey year were ranked, and the aim was to provide an animation that showed how those relationships changed over time, while also emphasizing very clearly how Spanish has, and continues, to dominate enrollment numbers, as Spanish heritage language programs are my specific area of focus.

The Journey

Visualization 1 comes from humble origins. My first attempt at working with the data was a series of simple static bar charts that showed enrollment numbers across years. Since no one wants to look at a series of boring bar charts, I decided to try my hand at animating in hopes it would still clearly show the overall data trend without taxing the viewer too much. In order to accomplish this I was forced to pare down my data from 1958-2016 to 1980-2016, to account for a lot of errors I kept coming up against when piping into ggplot.

Viz 1 still has a long way to go. From here, I plan to tweak colors and themes in order to make it more visually appealing and easy to read. I’m fairly happy with the speed, but there’s a lot of small things that need to be adjusted. The y-axis, for one, needs a lot of work and commas need to be placed in the number. As a reviewer, there is one specific thing that would be really helpful for me if you could weigh in on: is it easier to understand this animation if the label next to the bars displays total enrollments or a breakdown of proportion (as the percentage that each language accounts for from total non-English language enrollments)?

Viz 1 Draft

Visualization 2:

For this visualization I am creating a U.S. map color coded by density of students enrolled in language courses. The data used is from the year 2016–the most recent information I was able to find for both Modern Language Association’s (MLA) and National Center for Education Statistics (NCES) data which were joined together to create the map.

The Journey:

I suppose this visualization’s journey is much harder to see; it’s really been about the data behind the scenes. Though it looks like a large beige continental U.S.-shaped blob right now, I spent a lot of time navigating unfamiliar terrain to track down the different components of NCES data I needed for my vision. I had a big “I FINALLY UNDERSTAND JOINS!” moment once I was able to see how I needed to wrangle and join the datasets I had in order to compile geographic, characteristic, and enrollment data into one data frame for mapping. This initial draft shows my very first attempts with any type of map visualization. As a result, it’s got a lot of holes, but I’m still currently celebrating that I was able to get anything that resembles the United States. I am hoping next week’s lecture on this subject helps me get a lot further along with this visualization than I’ve been able to get so far.

The next step for the visualization is to actually make it do what it’s supposed to do. States will be colored in by density according to the proportion of students enrolled in non-English language courses compared to total institutional enrollment for the 2016-2017 academic year. State averages of this proportion were calculated and will be used to map color by density. I’ll also figure out how to add Hawaii and Alaska in at some point, and am not sure at the moment what to do with data from D.C., so that is also on my to-do list.

If I have enough time and am able to learn how, I hope to make the map interactive by allowing zoom in/out, plotting institutions, and creating a hover feature that gives institution name, total students enrolled, proportion of students in language courses, and top 5 languages being studied with numbers displayed. I do wonder if that’s too much information for this plot though, and in full transparency, it feels far too ambitious at the moment.

Viz 2

Visualization 3:

Oh, boy. This thing.

Visualization 3 is similar in thought to the first visualization. It’s meant to present trends over time, but this one is centered on institution type rather than language. The major divisions of public/private and 2-year/4-year+ are presented, and in the interest of readability I cut out the other variables I was originally interested in, including a breakdown by degree type.

The Journey

Do I even want to tell you about this journey? It’s been a rough one just getting to the draft point. I had big dreams when I started. I had all sorts of ideas about flowing Sankey charts á la NYT you know the one. But it was not meant to be. Maybe in another time, in another world, maybe with some more EDS courses under my belt…

I did find a really cool blog post that walks through recreating the animated Sankey chart, and I attempted to follow along with it, but I wasn’t able to wrap my head around their manipulation of functions and data enough to apply it for my own purposes.

My second choice was to use geom_dotplot(), thinking I could animate an effect of dots bouncing in or out as count changes across years. What actually happened was a graph with a huge stack of dots so tall you couldn’t actually see where they ended, which I was unable to fix. So I went with a safe choice instead, choosing geom_jitter() as the base for my animation. At least, I thought it was the safe choice, but I’ve yet to figure out how to get this visualization to display count in the manner I want.

Next steps include extensive theming and figuring out how to slow the animation speed way down.

Viz 3